Tidy Tuesday: US Populated Places
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today’s TidyTuesday is about place names as recorded by the US Board on Geographic Names. The dataset has been cleaned to include only populated places.
This week will involve more libraries than normal, since I am going to play with mapping.
library(tidyverse) # who doesn't want to be tidy? library(ggthemes) # more themes for ggplot library(gt) # For nice tables library(ggrepel) # to help position labels in ggplot graphs library(openxlsx) # importing excel files from a URL library(fuzzyjoin) # for joining on inexact matches library(sf) # for handling geo data library(mapview) # quick interactive mapping library(leaflet) # more mapping
Load dataset in the usual way.
tuesdata <- tidytuesdayR::tt_load(2023, week = 26) us_place_names <- tuesdata$`us_place_names` us_place_history <- tuesdata$`us_place_history`
I’d like to look at the places local to me. The dataset contains two dataframes- one with geographic details about the location and the other with some commentary like description and history.
va <- us_place_names %>% filter(state_name == "Virginia") va <- va %>% filter(county_name == "Arlington") va_joined <- va %>% left_join(us_place_history, by = join_by(feature_id))
I don’t need city, state, and county number since I am dealing with a single city/county. So I am removing them from the dataset and then viewing what I have.
va_joined %>% select(-state_name,-county_name,-county_numeric) %>% gt()
feature_id | feature_name | date_created | date_edited | prim_lat_dec | prim_long_dec | description | history |
---|---|---|---|---|---|---|---|
1471986 | Overlee Knolls | 1979-09-28 | 2022-06-07 | 38.88956 | -77.14776 | NA | NA |
1492448 | Addison Heights | 1979-09-28 | 2022-06-07 | 38.85567 | -77.06026 | NA | NA |
1492455 | Alcova Heights | 1979-09-28 | 2022-06-07 | 38.86456 | -77.09720 | NA | NA |
1492483 | Arlington Forest | 1979-09-28 | 2022-06-07 | 38.86872 | -77.11303 | NA | NA |
1492484 | Arlington Heights | 1979-09-28 | 2022-06-07 | 38.86956 | -77.09220 | NA | NA |
1492485 | Arlington Village | 1979-09-28 | 2022-06-07 | 38.86178 | -77.08526 | NA | NA |
1492487 | Arna Valley | 1979-09-28 | 2022-06-07 | 38.84428 | -77.07637 | NA | NA |
1492496 | Aurora Hills | 1979-09-28 | 2022-06-07 | 38.85150 | -77.06414 | NA | NA |
1492512 | Barcroft | 1979-09-28 | 2022-06-07 | 38.85595 | -77.10387 | NA | NA |
1492597 | Bluemont Junction | 1979-09-28 | 2022-06-07 | 38.87483 | -77.13331 | NA | NA |
1492606 | Bon Air | 1979-09-28 | 2022-06-07 | 38.87317 | -77.12665 | NA | NA |
1492659 | Buckingham | 1979-09-28 | 2022-06-07 | 38.87345 | -77.10665 | NA | NA |
1492771 | Claremont | 1979-09-28 | 2022-06-07 | 38.84317 | -77.10470 | NA | NA |
1492797 | Columbia Forest | 1979-09-28 | 2022-06-07 | 38.85400 | -77.11026 | NA | NA |
1492798 | Columbia Heights | 1979-09-28 | 2022-06-07 | 38.85761 | -77.12109 | NA | NA |
1492877 | Douglass Park | 1979-09-28 | 2022-06-07 | 38.84983 | -77.09303 | NA | NA |
1492958 | Fort Barnard Heights | 1979-09-28 | 2022-06-07 | 38.84650 | -77.08942 | NA | NA |
1493006 | Glencarlyn | 1979-09-28 | 2011-05-11 | 38.86178 | -77.12915 | NA | NA |
1493353 | North Fairlington | 1979-09-28 | 2022-06-07 | 38.83650 | -77.09720 | NA | NA |
1493397 | Parkglen | 1979-09-28 | 2022-06-07 | 38.85595 | -77.11637 | NA | NA |
1493586 | Shirlington | 1979-09-28 | 2022-06-07 | 38.84178 | -77.08831 | NA | NA |
1493630 | South Fairlington | 1979-09-28 | 2022-06-07 | 38.83261 | -77.08970 | NA | NA |
1493744 | Virginia Heights | 1979-09-28 | 2022-06-07 | 38.85095 | -77.11637 | NA | NA |
1493745 | Virginia Highlands | 1979-09-28 | 2022-06-07 | 38.85845 | -77.06470 | NA | NA |
1493784 | Westmont | 1979-09-28 | 2022-06-07 | 38.86261 | -77.09192 | NA | NA |
1495188 | Allencrest | 1979-09-28 | 2022-06-07 | 38.89344 | -77.15026 | NA | NA |
1495260 | Berkshire | 1979-09-28 | 2022-06-07 | 38.89789 | -77.15137 | NA | NA |
1495429 | Country Club Hills | 1979-09-28 | 2022-06-07 | 38.91400 | -77.13081 | NA | NA |
1495430 | Country Club Manor | 1979-09-28 | 2022-06-07 | 38.91372 | -77.13776 | NA | NA |
1495438 | Crescent Hills | 1979-09-28 | 2022-06-07 | 38.90483 | -77.14581 | NA | NA |
1495472 | Dominion Hills | 1979-09-28 | 2022-06-07 | 38.87595 | -77.14109 | NA | NA |
1495490 | East Falls Church | 1979-09-28 | 2022-06-07 | 38.88733 | -77.15442 | NA | NA |
1495579 | Garden City | 1979-09-28 | 2022-06-07 | 38.90011 | -77.13526 | NA | NA |
1495641 | Halls Hill | 1979-09-28 | 2022-06-07 | 38.89761 | -77.12859 | NA | NA |
1495692 | Highview Park | 1979-09-28 | 2022-06-07 | 38.89372 | -77.12748 | NA | NA |
1495804 | Lacey Forest | 1979-09-28 | 2022-06-07 | 38.88289 | -77.12915 | NA | NA |
1495821 | Larchmont | 1979-09-28 | 2022-06-07 | 38.88650 | -77.12776 | NA | NA |
1495887 | Madison Manor | 1979-09-28 | 2022-06-07 | 38.88039 | -77.14720 | NA | NA |
1496037 | Oakwood | 1979-09-28 | 2022-06-07 | 38.89733 | -77.16248 | NA | NA |
1496271 | Stratford Hills | 1979-09-28 | 2022-06-07 | 38.90872 | -77.14053 | NA | NA |
1496293 | Tara | 1979-09-28 | 2022-06-07 | 38.89039 | -77.13498 | NA | NA |
1496368 | Walker Chapel | 1979-09-28 | 2022-06-07 | 38.92150 | -77.12942 | NA | NA |
1496386 | West Arlington | 1979-09-28 | 2022-06-07 | 38.89400 | -77.16831 | NA | NA |
1496394 | Westover | 1979-09-28 | 2022-06-07 | 38.88706 | -77.13942 | NA | NA |
1496421 | Williamsburg Village | 1979-09-28 | 2022-06-07 | 38.90511 | -77.15498 | NA | NA |
1496434 | Woodland Acres | 1979-09-28 | 2022-06-07 | 38.91261 | -77.14526 | NA | NA |
1499060 | Arlingwood | 1979-09-28 | 2022-06-07 | 38.92761 | -77.12192 | NA | NA |
1499086 | Ballston | 1979-09-28 | 2011-05-11 | 38.88011 | -77.11387 | NA | NA |
1499108 | Beechwood Hills | 1979-09-28 | 2022-06-07 | 38.90900 | -77.10998 | NA | NA |
1499116 | Bellevue Forest | 1979-09-28 | 2022-06-07 | 38.91428 | -77.11359 | NA | NA |
1499157 | Brandon Village | 1979-09-28 | 2022-06-07 | 38.87567 | -77.11581 | NA | NA |
1499172 | Broyhill Forest | 1979-09-28 | 2022-06-07 | 38.91539 | -77.12248 | NA | NA |
1499245 | Cherrydale | 1979-09-28 | 2022-06-07 | 38.89706 | -77.10831 | NA | NA |
1499266 | Clarendon | 1979-09-28 | 2022-06-07 | 38.88595 | -77.09692 | NA | NA |
1499290 | Colonial Village | 1979-09-28 | 2022-06-07 | 38.89317 | -77.08609 | NA | NA |
1499313 | Crystal Spring Knolls | 1979-09-28 | 2022-06-07 | 38.90344 | -77.10498 | NA | NA |
1499349 | Dominion Heights | 1979-09-28 | 2022-06-07 | 38.89289 | -77.10776 | NA | NA |
1499354 | Dover | 1979-09-28 | 2022-06-07 | 38.90678 | -77.10581 | NA | NA |
1499439 | Fort Myer Heights | 1979-09-28 | 2022-06-07 | 38.89206 | -77.07942 | NA | NA |
1499560 | Highlands | 1979-09-28 | 2022-06-07 | 38.89817 | -77.08303 | NA | NA |
1499652 | Lee Heights | 1979-09-28 | 2022-06-07 | 38.90206 | -77.11720 | NA | NA |
1499696 | Lyon Park | 1979-09-28 | 2022-06-07 | 38.88067 | -77.09026 | NA | NA |
1499697 | Lyon Village | 1979-09-28 | 2022-06-07 | 38.89483 | -77.09498 | NA | NA |
1499930 | Radnor Heights | 1979-09-28 | 2022-06-07 | 38.88900 | -77.07303 | NA | NA |
1499964 | Rivercrest | 1979-09-28 | 2022-06-07 | 38.92206 | -77.11915 | NA | NA |
1499969 | Riverwood | 1979-09-28 | 2022-06-07 | 38.90539 | -77.10248 | NA | NA |
1499990 | Rosslyn | 1979-09-28 | 2022-06-07 | 38.89678 | -77.07248 | NA | NA |
1500349 | Woodmont | 1979-09-28 | 2022-06-07 | 38.90067 | -77.09498 | NA | NA |
1779110 | Brockwood | 1998-02-05 | 2022-06-07 | 38.87761 | -77.12887 | NA | NA |
1779112 | Country Club Grove | 1998-02-05 | 2022-06-07 | 38.91956 | -77.12942 | NA | NA |
1779118 | East Arlington (historical) | 1998-02-05 | 2022-06-07 | 38.87345 | -77.06220 | NA | NA |
1779119 | Green Valley | 1998-02-05 | 2022-06-07 | 38.85511 | -77.08859 | NA | NA |
1779147 | Millburn Terrace | 1998-02-05 | 2022-06-07 | 38.90067 | -77.13831 | NA | NA |
1783506 | Arlington | 1998-03-02 | 2022-06-07 | 38.89039 | -77.08414 | NA | NA |
2646878 | Crystal City | 2010-08-26 | 2018-11-14 | 38.85535 | -77.05090 | NA | NA |
There is no historical or descriptive data for any of the features in Arlington. Many of these are historical sites or are otherwise of interest. I’d like to augment this data with some context. Arlington has 23 neighborhoods that are on the National Register of Historic Places. The National Register does have scanned applications available for post 2012 applications, but most of the historic neighborhoods were designated prior to that. The National Register does also have a spreadsheet with links to the National archives, which contains the pre-2012 applications.
I normally like to use tidyverse packages, but read_excel won’t work with URLs. There are workarounds, but it is easier just to use the openxlsx package. The read.xlsx
function works as you’d expect but you do need to specify the sheet to read in.
national_historic <- read.xlsx( 'https://www.nps.gov/subjects/nationalregister/upload/national-register-listed-20230119.xlsx' , sheet = 1 )
Taking only my local historic sites. This dataset is annoying because some entries are in all CAPS (like state), but others are in titlecase (like City/County). Some, like building category are in both. To use the entire dataset some string cleaning and formating might be necessary, but for this case, I don’t need to do this.
arlington_historic <- national_historic %>% filter(State == "VIRGINIA" & County == "Arlington")
Looking at the data, it neighborhoods seem to be encoded as districts.
arlington_historic_districts <- arlington_historic %>% filter(Category.of.Property == "DISTRICT")
Arlington County has a website listing historic neighborhoods, and I know there should be 23. The National Register has 29 local entries. I should also note that only 17 of the Arlington neighborhoods appeared in our place names dataset.
On to figure out what the extra 3 historic places are. Apparently forts are also districts. There are also applications for boundary increases. To do this I am going to use the stringr function str_detect
to find “Boundary Increase” and “Fort” and use the negate = TRUE
flag to return everything that doesn’t match.
arlington_historic_districts2 <- arlington_historic_districts %>% filter(str_detect(Property.Name, "Boundary Increase", negate = TRUE)) %>% filter(str_detect(Property.Name, "Fort", negate = TRUE)) arlington_historic_districts2 %>% gt()
Reference.number | Property.Name | Status | Request.Type | Restricted.Address | Category.of.Property | State | County | City | Street.&.Number | External.Link | Federal.Agencies | Level.of.Significance.-.International | Level.of.Significance.-.Local | Level.of.Significance.-.National | Level.of.Significance.-.Not.Indicated | Level.of.Significance.-.State | Listed.Date | Name.of.Multiple.Property.Listing | NHL.Designated.Date | Other.Names | Park.Name | Status.Date | Area.of.Significance |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
_05001344 | Arlington Forest Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Bounded by Carlin Springs Rd., George Mason Dr., Henderson Rd., Aberdeen St., Columbus St., Granada, Galveston and 2nd | https://catalog.archives.gov/id/77834749 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 38688 | NA | NA | VDHR File No.000-7808 | NA | 38688 | ARCHITECTURE; COMMUNICATIONS |
_08000063 | Arlington Heights Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Bounded by Arlington Blvd., S. Fillmore St., S. Walter Reed Dr., columbia Pk., & S. Glebe Rd. | https://catalog.archives.gov/id/41678540 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 39499 | Garden Apartments, Apartment Houses and Apartment Complexes in Arlington County, Virginia MPS | NA | 000-3383 | NA | 39499 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_14000146 | Arlington National Cemetery Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | 1 Memorial Ave. | NA | DEPARTMENT OF THE ARMY | FALSE | FALSE | TRUE | FALSE | FALSE | 41740 | NA | NA | Arlington National Cemetery; DHR #000-0042 | NA | 41740 | MILITARY; LANDSCAPE ARCHITECTURE; POLITICS/GOVERNMENT; ARCHITECTURE |
_03000215 | Arlington Village Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | S 13th St., S 13 Rd., S 16th St., S Barton S., S. Cleveland St. and Edgewood St. | https://catalog.archives.gov/id/41679618 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 37722 | NA | NA | 000-0024 | NA | 37722 | COMMUNITY PLANNING AND DEVELOPMENT |
_03000561 | Ashton Heights Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by Wilson Bvd., N. Irving St., Arlington Bvd., N. Oxford St., N. Piedmont & N. Oakland Sts. | https://catalog.archives.gov/id/41679598 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 37795 | NA | NA | 000-7819 | NA | 37795 | ARCHITECTURE; COMMERCE |
_08001018 | Aurora Highlands Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Bounded by 16th St. S., S. Eads St., 26th St. S., and S. Joyce St. | https://catalog.archives.gov/id/77834759 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 39743 | NA | NA | 000-9706 | NA | 39743 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_98001649 | Buckingham Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by N. 5th, N. Oxford, and N. 2nd Sts., and N. Glebe Rd. | https://catalog.archives.gov/id/41679602 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 36181 | NA | NA | DHR File # 00-0025 | NA | 36181 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT; LANDSCAPE ARCHITECTURE |
_03000461 | Cherrydale Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by Lorcom Ln., N. Utah and N. Taylor Sts., and I-66 | https://catalog.archives.gov/id/41679592 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 37763 | NA | NA | VDHR File Number 000-7821 | NA | 37763 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_06000751 | Claremont Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Bounded by S. Dinwiddie St., S. Chesterfield Rd., S. Buchanan St., 25th St. S, 24th St. S, 23rd St. S and 22nd St. S | https://catalog.archives.gov/id/77834757 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 38960 | NA | NA | 000-9700 | NA | 38960 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_04000047 | Columbia Forest Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Bounded by 11th, S. Edison, S. Dinwiddie, S. Columbus, S. George Mason, and S. Frederick St. | https://catalog.archives.gov/id/41679620 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 38028 | NA | NA | VDHR # 000-9416 | NA | 38028 | COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE |
_12000239 | Dominion Hills Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by N. Four Mile Run Dr., N. McKinley Rd., N. Larrimore, N. Madison, N. Montana Sts., & 9th St. N. | https://catalog.archives.gov/id/77834753 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 41023 | Historic Residential Suburbs in the United States, 1830-1960 MPS | NA | VDHR FILE NUMBER: 000-4212 | NA | 41023 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_99000368 | Fairlington Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by Quaker Lane, King St., I-395, S. Walter Reed Dr., and S. Abingdon St. | https://catalog.archives.gov/id/41679636 | NA | FALSE | FALSE | TRUE | FALSE | FALSE | 36248 | NA | NA | DHR File No. 000-5772 | NA | 36248 | MILITARY; COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE |
_04000049 | Glebewood Village Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | N. Brandywine St. Bet. Lee Hwy and 10th Place N, 21St Rd. bet. N. Brandywine St. and N. Glebe Rd. | https://catalog.archives.gov/id/41679622 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 38028 | NA | NA | 000-9414 | NA | 38028 | COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE |
_08000910 | Glencarlyn Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Bounded by S. Carlin Springs Rd., Arlington Blvd., 5th Rd. S., Glencarlyn Park | https://catalog.archives.gov/id/77834761 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 39709 | NA | NA | 000-9704 | NA | 39709 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_11000548 | Highland Park-Overlee Knolls | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by 22nd St. N., N. Lexington St., 16th St. N., N. Longfellow St., McKinley Rd., I-66 & N. Quantico St. | https://catalog.archives.gov/id/77834763 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 40773 | Historic Residential Suburbs in the United States, 1830-1960 MPS | NA | Fostoria/VDHR File Number OOO-9703 | NA | 40773 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_04000109 | Lee Gardens North Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | 2300-2341 N. 11th St. | https://catalog.archives.gov/id/41678536 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 38043 | Garden Apartments, Apartment Houses and Apartment Complexes in Arlington County, Virginia MPS | NA | 000-9411; Woodbury Park Apartments | NA | 38043 | COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE |
_03000437 | Lyon Park Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by 10th St. N, Arlington Blvd., and N. Irving St. | https://catalog.archives.gov/id/41679594 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 37937 | NA | NA | 000-7820 | NA | 37937 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_02000512 | Lyon Village Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by Lee Hwy, N. Veitch St., N. Franklin Rd., N. Highland St., N. Fillmore St., and N. Kirkwood Rd. | https://catalog.archives.gov/id/41679590 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 37386 | NA | NA | VDHR File No. 000-7822 | NA | 37386 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_03000460 | Maywood Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by Lorcom Ln., Spout Run Parkway, I-66, Lee Highway, N. Oakland St., N. Nelson St., and N. Lincoln St. | https://catalog.archives.gov/id/41679596 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 37763 | NA | NA | VDHR File Number 000-5056 | NA | 37763 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_08000064 | Monroe Courts Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | 1041-1067 N. Nelson and 1036-1062 & 1033-1055 N. Monroe Sts. | https://catalog.archives.gov/id/77834751 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 39499 | NA | NA | 000-4105 | NA | 39499 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_04000112 | Penrose Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by Arlington Blvd., S. Courthouse Rd., S. Fillmore St., S. Barton St. S, and Columbia Pike | https://catalog.archives.gov/id/41679600 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 38306 | NA | NA | VDHR File Number 000-8823 | NA | 38306 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT; BLACK |
_08000065 | Virginia Heights Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Bounded by 10th Pl. S., S. Frederick St. & S. George Mason Dr. | https://catalog.archives.gov/id/77834755 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 39499 | NA | NA | 000-9701 | NA | 39499 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
_03000451 | Walter Reed Gardens Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | 2900-2906 13th St. S, 2900-2914 13th Rd S, 1301-1319 S. Walter Reed Dr. | https://catalog.archives.gov/id/41678548 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 37763 | Garden Apartments, Apartment Houses and Apartment Complexes in Arlington County, Virginia MPS | NA | Commons of Arlington; 000-8824 | NA | 37763 | COMMUNITY PLANNING AND DEVELOPMENT |
_04000111 | Waverly Hills Historic District | Listed | Single | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by 20th Rd. N, N. Utah St, I-66, N. Glebe Rd. and N. Vermont St. | https://catalog.archives.gov/id/41679624 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 38043 | NA | NA | VDHR File Number 000-9413 | NA | 38043 | COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE |
_06000345 | Westover Historic District | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Bounded by McKinley Rd., N. Washington Blvd., N. 16th St., N. Jefferson St., N. 11th St. and N. Fairfax Dr. | https://catalog.archives.gov/id/41678538 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 38839 | Garden Apartments, Apartment Houses and Apartment Complexes in Arlington County, Virginia MPS | NA | 000-0032 | NA | 38839 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
I still have too many entries. It turns out that Arlington National Cemetary is also encoded as a DISTRICT. There is also an entry for Walter Reed Gardens Historic District. Arlington County has this listed as a building on their site (and the other entries like Calvert Manor are noted as buildings in the National Register.)
I could remove these two items manually, but they will be removed when I join it to the place names dataset, since neither one appears in the populated place names.
Joining the two datasets will require some sort of string manipulation since the place names are not the same. The place names dataset contains just the place names (“Addison Heights”), while the historic sites data contains the phrase “Historic District” appended to the end. In addition, some place names don’t exactly match the historic district names (“Overlee Knolls” and “Highland Park/ Overlee Knolls Historic district”).
So I want to do some fuzzy matching and luckily (of course!) there is an R package for that.
However, the populated place names data contains “Arlington” which will match to a ton of different neighborhoods (Arlington Forest, Arlington Heights, etc.) I’m going to change Arlington to Arlington County.
va_joined2 <- va_joined %>% mutate(feature_name = ifelse(feature_name == "Arlington", "Arlington County", feature_name))
I also know that North and South Fairlington, while separate places in the populated place names, are a single historic district called Fairlington. I’m going to make both North and South Fairlington entry in the historical sites dataframe. I’m not removing the original Fairlington entry because I know I’m going to filter it out with my joins later. But this is the kind of thing that could lead to errors/ extraneous entries later on, so if you do something like this, just make sure you do clean it up later.
south_fairlington <- arlington_historic_districts2 %>% filter(Property.Name == "Fairlington Historic District") %>% mutate(Property.Name = "South Fairlington") north_fairlington <- arlington_historic_districts2 %>% filter(Property.Name == "Fairlington Historic District") %>% mutate(Property.Name = "North Fairlington") arlington_historic_districts3 <- arlington_historic_districts2 %>% rbind(south_fairlington) %>% rbind(north_fairlington)
Okay, on to fuzzyjoining. The name from the populated places names dataset should be a subset of the name from the historic district dataset. I’m going to illustrate this in a very simply way using str_detect()
. “Overlee Knolls” is the first entry in the populated places dataset. I’m going to use this as the pattern to search for in the Historic places dataset. The expected returned neighborhood is “Highland Park/ Overlee Knolls Historic district”.
va_joined2$feature_name[1]
[1] "Overlee Knolls"
arlington_historic_districts %>% filter(str_detect(Property.Name, va_joined2$feature_name[1])) %>% gt()
Reference.number | Property.Name | Status | Request.Type | Restricted.Address | Category.of.Property | State | County | City | Street.&.Number | External.Link | Federal.Agencies | Level.of.Significance.-.International | Level.of.Significance.-.Local | Level.of.Significance.-.National | Level.of.Significance.-.Not.Indicated | Level.of.Significance.-.State | Listed.Date | Name.of.Multiple.Property.Listing | NHL.Designated.Date | Other.Names | Park.Name | Status.Date | Area.of.Significance |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
_11000548 | Highland Park-Overlee Knolls | Listed | Multiple | FALSE | DISTRICT | VIRGINIA | Arlington | Arlington | Roughly bounded by 22nd St. N., N. Lexington St., 16th St. N., N. Longfellow St., McKinley Rd., I-66 & N. Quantico St. | https://catalog.archives.gov/id/77834763 | NA | FALSE | TRUE | FALSE | FALSE | FALSE | 40773 | Historic Residential Suburbs in the United States, 1830-1960 MPS | NA | Fostoria/VDHR File Number OOO-9703 | NA | 40773 | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT |
I’ve decided I only want to look at the historic areas in the populated place names. I’m choosing an inner join so I will only get entries that exist in BOTH the populated places and the historic register. This is 17 items (from manually comparing the populated places to the Arlington County website). I’m going to map these places on top of current Arlington County neighborhood groups/civic associates. I’m interested in how current neighborhood compare to the historic districts. (Note that I could have done this without the populated places dataset at all, but this is the Tidytuesday dataset and it is what lead me to my question.)
There are a few different ways to use fuzzyjoins. I found this discussion on stackoverflow to be a good starting point. I chose to use the match_fun version, since I had already prototyped with str_detect
. The only thing that wasn’t clear to me is which dataframe would be sent to str_detect
as the pattern and which was the string. That is, for
fuzzy_inner_join(x, y, by = c(x$name1 = y$name2), match_fun = str_detect)
would I get
str_detect(string = x$name1, pattern = y$name2)
or
str_detect(string = y$name2, pattern = x$name1)
?
Maybe it is clear to others from the stackoverflow example or the fuzzyjoin manual, but it wasn’t clear to me, so I ended up trying it both ways. It turns out that the dataframes are passed to str_detect
in the order they are listed, which makes sense (and is probably the convention, but I had never seen it explicitly stated). [To be absolutely clear, what happens is the first case (str_detect(string = x$name1, pattern = y$name2)
)]
historic_pop_places <- arlington_historic_districts3 %>% fuzzy_inner_join(va_joined2, by = c("Property.Name" = "feature_name"), match_fun = str_detect)
For what I plan to do, I need the place name and the location. I want the reason the place is important and a link to the historic registry application. I started this project wanting to know why these places were important! I’m leaving in both sets of place names, just so I can visually check that my dataset is correct.
historic_pop_places <- historic_pop_places %>% select( Property.Name, feature_name, Area.of.Significance, prim_lat_dec, prim_long_dec, External.Link ) gt(historic_pop_places)
Property.Name | feature_name | Area.of.Significance | prim_lat_dec | prim_long_dec | External.Link |
---|---|---|---|---|---|
Arlington Forest Historic District | Arlington Forest | ARCHITECTURE; COMMUNICATIONS | 38.86872 | -77.11303 | https://catalog.archives.gov/id/77834749 |
Arlington Heights Historic District | Arlington Heights | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.86956 | -77.09220 | https://catalog.archives.gov/id/41678540 |
Arlington Village Historic District | Arlington Village | COMMUNITY PLANNING AND DEVELOPMENT | 38.86178 | -77.08526 | https://catalog.archives.gov/id/41679618 |
Aurora Highlands Historic District | Highlands | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.89817 | -77.08303 | https://catalog.archives.gov/id/77834759 |
Buckingham Historic District | Buckingham | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT; LANDSCAPE ARCHITECTURE | 38.87345 | -77.10665 | https://catalog.archives.gov/id/41679602 |
Cherrydale Historic District | Cherrydale | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.89706 | -77.10831 | https://catalog.archives.gov/id/41679592 |
Claremont Historic District | Claremont | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.84317 | -77.10470 | https://catalog.archives.gov/id/77834757 |
Columbia Forest Historic District | Columbia Forest | COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE | 38.85400 | -77.11026 | https://catalog.archives.gov/id/41679620 |
Dominion Hills Historic District | Dominion Hills | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.87595 | -77.14109 | https://catalog.archives.gov/id/77834753 |
Glencarlyn Historic District | Glencarlyn | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.86178 | -77.12915 | https://catalog.archives.gov/id/77834761 |
Highland Park-Overlee Knolls | Overlee Knolls | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.88956 | -77.14776 | https://catalog.archives.gov/id/77834763 |
Lyon Park Historic District | Lyon Park | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.88067 | -77.09026 | https://catalog.archives.gov/id/41679594 |
Lyon Village Historic District | Lyon Village | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.89483 | -77.09498 | https://catalog.archives.gov/id/41679590 |
Virginia Heights Historic District | Virginia Heights | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.85095 | -77.11637 | https://catalog.archives.gov/id/77834755 |
Westover Historic District | Westover | ARCHITECTURE; COMMUNITY PLANNING AND DEVELOPMENT | 38.88706 | -77.13942 | https://catalog.archives.gov/id/41678538 |
South Fairlington | South Fairlington | MILITARY; COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE | 38.83261 | -77.08970 | https://catalog.archives.gov/id/41679636 |
North Fairlington | North Fairlington | MILITARY; COMMUNITY PLANNING AND DEVELOPMENT; ARCHITECTURE | 38.83650 | -77.09720 | https://catalog.archives.gov/id/41679636 |
Aurora Highlands and Highlands are the same place- the description of Aurora Highlands from Wikipedia matches the description in the application to be entered on the National Historic Register.
Now, I found a map of all the Civic associations in Arlington on the county’s open data page. Data can be downloaded in a variety of formats, including shape files or geoJSON. I chose to download the shapefile and extracted the zip to my project directory (not shown).
The R Graph Gallery (which is a great resource and source of inspiration) has a great section on mapping, but unfortunately one of the needed packages is being retired. The code below still works but you will get a very long message telling you to migrate away from rgal.
# library(sp) # library(rgdal) # my_spdf <- readOGR( # dsn = "Civic_poly.shp" , # verbose=FALSE #)
So, here is another way to read in the shape file using the sf package. This contains the polygons that define the boundaries of modern neighborhoods in Arlington. There are a lot of neighborhoods!
arlington_polygons <- st_read(dsn = "Civic_poly.shp")
Mapping points (which is what we have in our TidyTuesday dataset- we have the lat/long of the “official feature location”) and polygons from the Arlington County dataset involved a few steps. Shape files can be encoded using different coordinate reference systems (CRS) and care needs to be taken that all the map layers are using the same CRS. I found the mapview package invaluable during this process, as it is simple to create an interactive map. This made trouble shooting incredibly easy.
Generally, the first step for handling shape files in R is to convert them to simple features objects. Here, I’m using the sf_package. With a shape file, you generally don’t need to pass the coordinates or CRS, since that data is encoded in the shape file in a way that is easily detectable by the function.
arlington_polygons_sf <- st_as_sf(arlington_polygons) mapview(arlington_polygons_sf)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.